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Abstract 

We investigate the capacity of three symmetric quantum states in three real dimensions 
to carry classical information. Several such capacities have already been defined, depending 
on what operations are allowed in the sending and receiving protocols. These include the 
C\,\ capacity, which is the capacity achievable if separate measurements must be used for 
, ^ ■ each of the received states, and the Ci j00 capacity, which is the capacity achievable if joint 

^5 '• measurements are allowed on the tensor product of all the received states. We discover a 

^vq ' new classical information capacity of quantum channels, the adaptive capacity C\ a, which lies 

(N ■ strictly between the C\^\ and the Ci j00 capacities. The adaptive capacity allows what is known 

, as the LOCC (local operations and classical communication) model of quantum operations 

| for decoding the channel outputs. This model requires each of the signals to be measured 

by a separate apparatus, but allows the quantum states of these signals to be measured in 
stages, with the first stage partially reducing their quantum states, and where measurements 
\Q [ in subsequent stages may depend on the results of a classical computation taking as input the 

outcomes of the first round of measurements. We also show that even in three dimensions, 
with the information carried by an ensemble containing three pure states, achieving the C% t i 
\ capacity may require a POVM with six outcomes. 

1 Introduction 

For classical channels, Shannon's theorem |16j gives the information-carrying capacity 
of a channel. When one tries to generalize this to quantum channels, there are several 
ways to formulate the problem which have given rise to several different capacities. In 
^ ■ this paper, we consider the capacity of quantum channels to carry classical information, 

with various restrictions on how the channel may be used. Several such capacities have 
already been defined for quantum channels. In particular, the Ci 5 i capacity, where only 
tensor product inputs and tensor product measurements are allowed OEB^DE]; arid 
the Ci j00 capacity, where tensor product inputs and joint measurements are allowed 
[HI El E| j have both been studied extensively. We will be investigating these capacities 
in connection with a specific example; namely, we analyze how these capacities behave 
on a symmetric set of three quantum states in three dimensions which we call the 
lifted trine states. A quantum channel of the type that Holevo .7 classifies as c-q 
(classical-quantum) can be constructed from these states by allowing the sender to 
choose one of these three pure states, which is then conveyed to the receiver. This 
channel is simple enough that we can analyze the behavior of various capacities for 
it, but it is also complicated enough to exhibit interesting behaviors which have not 
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been observed before. In particular, we define a new, natural, classical capacity for a 
quantum channel, the C\,a capacity, which we also call the adaptive one-shot capacity, 
and show that it is strictly between the C\ \ capacity (also called the one-shot quantum 
capacity) and the C\ >00 capacity (also called the Holevo capacity). 

The three states we consider, the lifted trine states, are obtained by starting with 
the two-dimensional quantum trine states, (1,0), (— 1/2, \/3/2), (— 1/2, — y/3/2) intro- 
duced by Holevo [Hj and later studied by Peres and Wootters ^3]. We add a third 
dimension to the Hilbert space of the trine states, and lift all of the trine states 
out the plane into this dimension by an angle of arcsin y/a, so the states become 
(VI — ot, 0, \fa), and so forth. We will be dealing with small a (roughly, a < 0.1), 
so that they are close to being planar. This is one of the interesting regimes. When 
the trine states are lifted further out of the plane, they behave in less interesting ways 
until they are close to being vertical; then they start being interesting again, but we 
will not investigate this second regime. 

To put this channel into the formulation of completely positive trace-preserving 
operators, we let the sender start with a quantum state in a three-dimensional input 
state space, measure this state using a von Neumann measurement with three out- 
comes, and send one of the lifted trines To, T\ or T2, depending on the outcome of this 
measurement. This process turns any quantum state into a probability distribution 
over To, T\ and T2. 

The first section of the paper deals with the accessible information for the lifted trine 
states when the probability of all three states is equal. The accessible information of 
an ensemble is the maximum mutual information obtainable between the input states 
of the ensemble and the outcomes of a POVM (positive operator valued measure) 
measurement on these states. The substance of this section has already appeared, in 
|17j . Combined with Appendix C, this shows that the number of projectors required 
to achieve the C\\ capacity for the ensemble of lifted trines can be as large as 6, the 
maximum possible by the real version of Davies' theorem. The second section deals 
with the Cx i channel capacity (or the one-shot capacity), which is the maximum of the 
accessible information over all probability distributions on the trine states. This has 
often been called the C\ capacity because it is the classical capacity obtained when you 
are only allowed to process (i.e., encode/measure) one signal at a time. We call it Cx,i 
to emphasize that you are only allowed to input tensor product states (the first 1), and 
only allowed to make quantum measurements on one signal at a time (the second 1). 
The third section deals with the new capacity Cxa-, the "adaptive one-shot capacity." 
This is the capacity for sending classical information attainable if you are allowed to 
send codewords composed of tensor products of lifted trine states, are not allowed 
to make joint measurements involving more than one trine state, but are allowed to 
make a measurement on one signal which only partially reduces the quantum state, 
use the outcome of this measurement to determine which measurement to make on 
a different signal, return to refine the measurement on the first signal, and so forth. 
In Section El we give an upper bound on the Cia capacity of the lifted trine states, 
letting us show that for the lifted trine states with sufficiently small a, this adaptive 
capacity is strictly larger than the Ci t x channel capacity. In section we show and 
that for two pure non-orthogonal states, Cx,a is equal to Ci t x, and thus strictly less 
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than the Holevo capacity C\ )00 . These two results show show that C\ t A is different 
from previously defined capacities for quantum channels. To obtain a capacity larger 
than C\\i it is necessary to make measurements that only partially reduce the state of 
some of the signals, and then later return to refine the measurement on these signals 
depending on the results of intervening measurement. In Section Q we show if you use 
"sequential measurement", i.e., only measure one signal at a time, and never return to 
a previously measured signal, it is impossible to achieve a capacity larger than C\\. 
We take the lifted trine states to be: 

To (a) = (VI — a, 0, y/a) 

Ti(a) = (-±x/T^,^x/T^, y/a) (1) 
T 2 (a) = (-|vT^a,-^vT^a,Va) 

When it is clear what a is, we may drop it from the notation and use To, T%, or T 2 . 



2 The Accessible Information 

In this section, we find the accessible information for the ensemble of lifted trine states, 
given equal probabilities. This is defined as the maximal mutual information between 
the trine states (with probabilities | each) and the elements of a POVM measuring 
these states. Because the trine states are vectors over the reals, it follows from the 
generalization of Davies' theorem to real states (see, e.g., dU) that there is an optimal 
POVM with at most six elements, all the components of which are real. The lifted 
trine states are three-fold symmetric, so by symmetrizing we can assume that the op- 
timal POVM is three-fold symmetric (possibly at the cost of introducing extra POVM 
elements). Also, the optimal POVM can be taken to have one-dimensional elements 
E, so the elements can be described as vectors | «j) where E{ = \ vj) {vi |. This means 
that there is an optimal POVM whose vectors come in triples of the form: y/pPo((t>, 9), 
^/pPi((p,8), ^/pP2(4>,9), where p is a scalar probability and 

Po(cp,9) = (cos^cos#,cos(/>sin#,sin0) 

P x (<j),0) = (cos0cos(6» + 27r/3),cos0sin(6l + 27r/3),sin0) (2) 
P2(4>,6) = (cos 0cos(0 — 2-7r/3),cos </>sin(0 — 2tt/3), sin <j)). 

The optimal POVM may have several such triples, which we label y/pi P^^i, 9%), 
y/p~2Pb{4>2-, 92), ■ ■ y/lh^ Pb(4>m, 9 m ). It is easily seen that the conditions for this set of 
vectors to be a POVM are that 

m m 

^Pisin 2 ((^) = 1/3 and ^ Pi = l. (3) 

i=\ i=l 

One way to compute the accessible information is to break the formula for 
accessible information into pieces so as to keep track of the amount of information 
contributed to it by each triple. That is, Ia will be the weighted average (weighted 
by pi) of the contribution I(4>,6) from each (<fi,9). To see this, recall that Ia is the 
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mutual information between the input and the output, and that this can be expressed 
as the entropy of the input less the entropy of the input given the output, H(X- m ) — 
H{X- m \X out ). The term H(X- m \X out ) naturally decomposes into terms corresponding 
to the various POVM outcomes, and there are several ways of assigning the entropy of 
the input H(X- m ) to these POVM elements in order to complete this decomposition. 
This is how I first arrived at the formula for I a- I briefly sketch this analysis, and then 
go into detail in a second analysis. This second analysis is superior in that it explains 
the form of the answer obtained, but it is not clear how one could discover the second 
analysis without first knowing the result. 

For each <p, and each a, there is a 9 that optimizes I((f>,6). This 9 starts out at 
7r/6 for (j) = 0, decreases until it hits at some value of 4> (which depends on a), 
and stays at until <fi reaches its maximum value of tt/2. For a fixed a, by finding 
(numerically) the optimal value of 9 for each (f> and using it to obtain the contribution 
to I a attributable to that <j), we get a curve giving the optimal contribution to 1a for 
each (j). If this curve is plotted, with the x-value being sin 2 <fi and the y-value being the 
contribution to I a, an optimal POVM can be obtained by finding the set of points on 
this curve whose average x-value is 1/3 (from Eq. |3J) , and whose average y- value is as 
large as possible. A convexity argument shows that we only need at most two points 
from the curve to obtain this optimum; we will need one or two points depending on 
whether the relevant part of the curve is concave or convex. For small a, it can be 
seen numerically that the relevant piece of the curve is convex, and we need two 0's 
to achieve the maximum. One of the ((f), 6) pairs is (0, vr/6), and the other is ((f> a ,0) 
for some (f> a arcsin(l/ \/3) • The formula for this (f> a will be derived later. Each of 
these (/>'s corresponds to a triple of POVM elements, giving six elements for the optimal 
POVM. 

The analysis in the remainder of this section gives a different way of describing 
this six-outcome POVM. This analysis unifies the measurements for different a, and 
also introduces some of the methods that will appear again in Section |1J Consider 
the following measurement protocol. For small a (a < 71 for some constant 71), we 
first take the trine Tj,(a) and make a partial measurement which either projects it 
onto the x, y plane or lifts it further out of this plane so that it becomes the trine 
Tb("ii)- (Here 71 is independent of a.) If the measurement projects the trine into the 
x, y plane, we make a second measurement using the POVM having outcome vectors 



2/3(0, 1) and y/2j3(±y/3/2, -1/2). This is the optimal POVM for trines in the x, y- 



plane. If the measurement lifts the trine further out of the x, y plane, we use the 



von Neumann measurement that projects onto the basis consisting of (\/2/3, 0, ^/l/3), 




(— -\/l/6,±-\/l/2, \/l/3). If a is larger than 71 (but smaller than 8/9), we skip the 

first partial measurement, and just use the above von Neumann measurement. Here, 
71 is obtained by numerically solving a fairly complicated equation; we suspect that 
no closed form expression for it exists. The value of 71 is 0.061367, which is sin 2 (f> for 
(f) ps 0.25033 radians (14.343°). 

We now give more details on this decomposition of the POVM into a two-step pro- 
cess. We first apply a partial measurement which does not extract all of the quantum 
information, i.e., it leaves a quantum residual state that is not completely determined 
by the measurement outcome. Formally, we apply one of a set of matrices A{ satisfy- 
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ing ^i^Ai = I. If we start with a pure state | v), we observe the i'th outcome with 
probability (v \ A\Ai \ v), and in this case the state | v) is taken to the state A{ \ v) . We 
choose as the A^s the matrices ^fplM{(pi) where 



/ ^: ~ cos < 



M(cj>) 



V 












^ cos i 








\ 



(4) 



\^3sin4> J 
The yppl M{(pi) form a valid partial measurement if and only if 



sin 



i) = 1/3 and ^Pi = 1. 



By first applying the above ^Jpl M {(pi) , and then applying the von Neumann measure- 
ment with the three basis vectors 



V (9) 
Vi{9) 
V 2 {9) 



(yf cos 9, yf sin^, j^j 

(yf cos(9 + 2vr/3), yf sin(0 + 2tt/3), ^ 

(yf cos(6» - 2vr/3), yf sin(6> - 2vr/3), ^ 



(5) 



we obtain the POVM given by the vectors ^fpi Pb{9i, (pi) of Eq. ©; checking this is 
simply a matter of verifying that Vb{9)M{<p) = Pb{9, (p). Now, after applying ^fpl M {(pi) 
to the trine To (a), we get the vector 



(y3/2\/I - a^/plcos (pi,0, v^A/a^/^sin^j). 



This is just the state Jp^T^a'j) where Tq(c4) is the trine state with 



(6) 



a sin 



a, 



a sin 2 (pi + \ (\ — a) 



cos z 



and where 



Pi 



3pi a sin 2 (p + i(l — a) cos 2 



(7) 



(8) 



is the probability that we observe this trine state, given that we started with To (a). 
Similar formulae hold for the trine states T\ and T 2 . We compute that 



^2Pi a 'i = ^2 3pia sin 2 (4>i) 



a. 



(9) 



The first stage of this process, the partial measurement which applies the matrices 
^/piM((j)i), reveals no information about which of To, T\, T 2 we started with. Thus, by 
the chain rule for classical Shannon information |2j , the accessible information obtained 
by our two-stage measurement is just the weighted average (the weights being p'A of 
the maximum over 9 of the Shannon mutual information I a i (9) between the outcome 
of the von Neumann measurement V{9) and the trine T{a' i ). By convexity, it suffices 
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Figure 1: The value of 6 maximizing I a for a between and 0.07. This function starts 
at 7r/6 at a = 0, decreases until it hits at a = 0.056651 and stays at for larger a. 

to use only two values of oi i to obtain this maximum. In fact, the optimum is obtained 
using either one or two values of ol i depending on whether the function 

I a > = max 

is concave or convex over the appropriate region. In the remainder of this section, we 
give the results of computing (numerically) the values of this function I a >. For small 
enough a it is convex, so that we need two values of a' , corresponding to a POVM 
with six outcomes. 

We need to calculate the Shannon capacity of the classical channel whose input 
is one of the three trine states T(a'), and whose output is determined by the von 
Neumann measurement V{9). Because of the symmetry, we can calculate this using 
only the first projector Vo- The Shannon mutual information between the input and 
the output is H(X- m ) — H(X- m \X ou t), which is 

2 

I a > = log 2 3 + Y l \{VomT b (a'))\ 2 log 2 \{V (e)\T b (a'))\ 2 . (10) 

6=0 
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Figure 2: This plot shows I a {6) for < a < 0.07 and various 9. This is the mutual 
information between the lifted trines at an angle of arcsin yfa to the x-y plane, and a 
von Neumann measurement rotated with respect to these trines by an angle 9. The 
green curve AZ is I a (0) and the green curve BV is I Q (7r/6). The 9 = curve is optimal 
for a > 0.056651, and 9 = tt/6 is optimal for a = 0. The dashed yellow curves show 
I a {6) for 9 at intervals of 3° between 0° and 30° radians). Finally, the red curve 
BZ shows I a (9 op t) for those a where neither nor tt/6 is the optimal 9. The function 
9 opt is given in Figure ^ It is hard to see from this plot, but the red curve BZ is 
slightly convex, i.e., its second derivative is positive. This is clearer in Fig. El 

The 9 giving the maximum I' a is ir/6 for a' = 0, decreases continuously to at a' = 
0.056651, and remains for larger a' . (See Fig. ^) This value 0.056651 corresponds to 
an angle of 0.24032 radians (13.769°). This 9 was determined by using the computer 
package Maple to numerically find the point at which dl a (9)/d9 = 0. 

By plugging the optimum 9 into the formula for I a >, we obtain the optimum von 
Neumann measurement of the form V above. We believe this is also the optimal generic 
von Neumann measurement, but have not proved this. The maximum of I a '(9) over 9, 
and curves that show the behavior of I a '(9) for constant 9, are plotted in Fig. [2j We 
can now observe that the leftmost piece of the curve is convex, and thus that for small 
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Figure 3: This graph contains three curves. As in Fig. |2J the green curve AZ is I a (0) 
and the red curve BZ is the maximum over 6 of I a {9) for a < 0.056651 (for larger 
a, this maximum is the green curve). The blue line BZ is straight; it is the convex 
envelope of the red and green curves and lies slightly above the red curve BZ. This 
blue line is a linear interpolation between a = and a = 0.061367 and corresponds to 
a POVM having six elements. It gives the accessible information for the lifted trine 
states T{a) when < a < 0.061367. The difference between the blue and red curves 
is maximum at a = 0.024831, when this difference reaches 0.0038282. 

a the best POVM will have six projectors, corresponding to two values of a' . For trine 
states with < a < 0.061367, the two values of a' giving the maximum accessible 
information are and 0.061367; we call this second value 71. The trine states T(ji) 
make an angle of 0.25033 radians (14.343°) with the x-y plane. 

We can now invert the formula for a' (Eq. d) to obtain a formula for sin 2 0, and 
substitute the value of a' = 71 back into the formula to obtain the optimal POVM. 



We find 




1 — a 
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1 + 29.591a v ' 

where 71 ~ 0.061367 as above. Thus, the elements in the optimal POVM we have 
found for the trines T(a), when a < 71, are the six vectors Pb((f> a ,Qi) and P&(0,tt/6), 
where <f) a is given by Eq.^2 an d b = 0, 1, 2. Fig.|3]plots the accessible information given 
by this six-outcome POVM, and compares it to the accessible information obtained by 
the best known von Neumann measurement. 

We also prove there are no other POVM's which attain the same accessible in- 
formation. The argument above shows that any optimal POVM must contain only 
projectors chosen from these six vectors: only those two values of a' can appear in 
the measurement giving maximum capacity, and for each of these values of a' there 
are only three projectors in V(0) which can maximize I a > for these a 1 . It is easy to 
check that there is only one set of probabilities pi which make the above six vectors 
into a POVM, and that none of these probabilities are for < a < 71. Thus, for 
the lifted trine states with < a < 71, there is only one POVM maximizing accessible 
information, and it contains six elements, the maximum possible for real states by a 
generalization of Davies' theorem |14j . 



3 The C11 Capacity 

In this section, we discuss the G\ \ capacity (or one-shot capacity) of the lifted trine 
states. This is the maximum of the accessible information over all probability distri- 
butions of the lifted trine states. Because the trine states are real vectors, it follows 
from a version of Davies' theorem that there is an optimal POVM with at most six ele- 
ments, all the components of which are real. Since the lifted trine states are three-fold 
symmetric, one might expect that the solution maximizing C\^\ capacity is also three- 
fold symmetric. However, unlike accessible information, for C\ \ capacity a symmetric 
problem does not mean that the optimal probabilities and the optimal measurement 
can be made symmetric. Indeed, for the planar trine states, it is known that they 
cannot. The optimal C\^\ capacity for the planar trine states T(0) is obtained by as- 
signing probability 1/2 to two of the three states and not using the third one at all. 
(See Appendix A.) This gives a channel capacity of 1 - #(1/2 - \/3/4) = 0.64542 
bits, where H[x) = — x\og 2 x — (1 — x) log 2 (l — x) is the binary Shannon entropy. As 
discussed in the previous section, the accessible information when all three trine states 
have equal probability is log 2 3 — 1 = 0.58496 bits. 

In this section, we first discuss the best measurement we have found to date. We 
believe this is likely to be the optimal measurement, but do not have a proof of this. 
Later, we will discuss what we can actually prove; namely, that as a approaches (i.e., 
for nearly planar trine states), the actual C\ \ capacity becomes exponentially close to 
the value given by our conjectured optimal measurement. We postpone this proof to 
Section |S] so that in Section we can complete our presentation of the various channel 
capacities of the lifted trine states by giving an adaptive protocol that improves on our 
conjectured C\ t i capacity. Together with the bounds in Section El this lets us prove 
that the adaptive capacity C± : a is strictly larger than Ci,i- 
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Our starting point is the C\ \ capacity for planar trines. The optimum probability 
distribution uses just two of the three trines. For two pure states, | v\) and | v 2 ), the 
optimum measurement for C\ \ is known. Let the states have an angle 9 between them, 
so that \{v\ \v 2 ) | 2 = cos 2 6. We can then take the two states to be v\ = (cos |, sin |) 
and v 2 = (cos |, — sin |). The optimal measurement is the von Neumann measurement 
with projectors P± = (l/\/2, ±l/\/2)- This measurement induces a classical binary 
symmetric channel with error probability 

(P+\v 2 ) 2 = cos 2 (#/2 + 7r/4) 
1 — sin 9 
2 ' 

and the C\ \ capacity is thus 1 — — ^sin#). Thus, for the planar trines, the 
Ci t i capacity is 1 - H(l/2 - \/3/4) = 0.64542. To obtain our best guess for the 
Ci 1 capacity of the lifted trines with a small, we will give three successively better 
guesses at the optimal probability distribution and measurement. For small a, we 
know of nothing better than the third guess, which we conjecture to be optimal when 
a < 0.018073. John Smolin has tried searching for solutions using a hill-climbing 
optimization program, and failed to find any better measurement for Ci 1, although 
the program did converge to the best known value a significant fraction of the time 

HH1. 

For the trines T(0), the optimum probability distribution is (^, |, 0). Our first guess 
is to continue to use the same probability distribution for a > 0. For the trines T(a), 
this probability distribution, (4, i,0), the optimum measurement is a von Neumann 
measurement with projectors 

Qo(P) = (VMVw) 

QM = -^(-vT=Ai,v^) 

QM = -^(-v/T^A-l,^) (12) 

where f3 = 4a/ (3a + 1). The C^i capacity in this case is 1 — H(p) where p = ^(3a — 
l) 2 /(3a + 1). This function is plotted in Fig. We call this the two-trine capacity. 

The second guess comes from using the same measurement, Q(f3), as the first guess, 
but varying the probabilities of the three trine states so as to maximize the C\ \ capacity 
obtained using this measurement. To do this, we need to consider the classical channel 
shown in Fig. ^ Because of the symmetry of this channel, the optimal probability 
distribution is guaranteed to give equal probabilities to trines T\ and T 2 . Remarkably, 
this channel has a closed form for the probability p for the third trine which maximizes 
the mutual information. Expressing the mutual information as H (X out ) — H (X out \X[ n ) , 
we find that this simplifies to 

(l-p)(l-H{q)) + H(p5)-pH(5) (13) 

where 5 = (Q \T ) 2 and q = (Q 2 \Ti) 2 = (Qi\T 2 ) 2 . Taking the derivative of this 
function with respect to p, setting it to 0, and moving the terms with p in them to the 
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Figure 4: The classical channel induced by the measurement Q(4a/ (3a+l)) on the lifted trincs 
T(a). The inputs, from top to bottom, correspond to T\, T%, and To; the outputs correspond 
to Qi, Q2 and Qq. The transition probabilities are given above, where 5 = (Qo\Tq) 2 and 
q=(Qx\T 2 )* = (Q 3 \T 1 )*. 



left side of the equality gives 

5 [log 2 (l - pg) - log a (p)] = 1 - H{q) -{1-5) log 2 (l - 5). (14) 

Dividing by 5 and exponentiating both sides gives 

l^Sp = 2 1 (1 _ H (q) -(1-6) log 2 (l - 5)) (15) 
P 

which has the solution 

P= t, r- (16) 

5 + exp - ff(g) - (1 - 6) log 2 (l - 5)]) 

Using this value of p, and the measurement of Eq.Elwith P = 4a/(3a + l), we obtain a 
curve that is plotted in Fig. El Note that as a goes to 0, 5 goes to and the exponential 
on the right side goes to so p becomes exponentially small. It follows that 

this function differs from the two-trine capacity by an exponentially small amount as a 
approaches 0. Note also that no matter how small 5 is, the above value of p is non-zero, 
so even though the two-trine capacity is exponentially close to the above capacity, it 
is not equal. 

For our third guess, we refine the above solution slightly. It turns out that the (5 
used to determine the measurement Q((3) is no longer optimal after we have given a 
non-zero probability to the third trine state. What we do is vary both p and (5 to find 
the optimal measurement for a given a. This leads to the classical channel shown in 
Fig. El Here, q and 5 take the same values as above, and e = (Qol^i) 2 = (Qol^) 2 - As 
we did for the case with e = 0, we can write down the channel capacity, differentiate 
with respect to p, and solve the resulting equation. In this case, the solution turns out 
to be 

P= (5-e) (1+V) (17) 
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Figure 5: The classical channel induced by the von Neumann measurement Q(/3) on the lifted 
trines T(a). The inputs correspond (top to bottom) to T\, T2 and To; the outputs correspond 
(top to bottom) to Qi, Q2 and Qo- 

where 

7 l-e-H(q;e;l-q-e) + H(S) 

z = jzr e • ( 18 ) 

Here 

k 

H(pr,P2] ■ ■ ■ ;pk) = ~Pi lo g2Pj 
3=1 

is the Shannon entropy of the probability distribution {pi,P2> " ' >Pk}' We have nu- 
merically found the optimum (3 for the measurement Q((3), and used this result to 
obtain the Ci 1 capacity achieved by optimizing over both p and Q((3). This capacity 
function is shown in Fig. |f)J This capacity, and the capacities obtained using various 
specific values of (3 in Q(0) are shown in Fig. For a > 0.040491, the optimum [3 is 
|; note that the measurement Q(|) is the same as the measurement V(0) introduced 
in Section |2J Eq. |SJ The measurement Q(f3) appears to give the C\ t \ capacity for 
a < 0.018073 [and for a > 0.061367, where the optimum measurement is Q(|)]- 

Now, suppose that in the above expression for p [Eqs. (|T7)) and (fTH|) ]. e and 5 are 
both approaching 0, while q is bounded away from |. If 5 > e, then 2 Z is exponentially 
large in 1/(6— e), and the equation either gives a negative p (in which case the optimum 
value of p is actually 0) or p is exponentially small. If e > 5, and both e and 5 are 
sufficiently small, then 2 Z is exponentially small in l/(e — <5) and the value of p in the 
above equation is negative, so that the optimum value of p is 0. There are solutions to 
the above equation which have e > 6 and positive p, but this is not the case when e is 
sufficiently close to 0. 

It follows from the above argument that, as a goes to 0, the optimum probability 
p for the third trine state goes to exponentially fast in 1/6, and so the Ci 1 capacity 
obtained from this measurement grows exponentially close to that for the two-trine ca- 
pacity, since the two probability distributions differ by an exponentially small amount. 

We have now described our conjectured Cx t i for the lifted trine states, and the 
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measurements and probability distributions that achieve it. In the next section, we will 
show that there is an adaptive protocol which achieves a capacity Ci^a considerably 
better than our conjectured Cia capacity. When a is close to 0, it is better by an 
amount linear in a. To rigorously prove that it is better, we need to find an upper 
bound on the capacity C\a which is less than C\ a. We already noted that, as a 
approaches 0, all three of our guesses for C%a become exponentially close. In Section |S] 
of this paper we prove that the true C\ \ capacity must become exponentially close to 
these guesses. 

Because these three guesses for C%a become exponentially close near a = 0, they 
all have the same derivative with respect to a at a = 0. Our first guess, which used 
only two of the three trines, is simple enough that we can compute this derivative 
analytically, and we find that its value is 



This contrasts with our best adaptive protocol, which has the same capacity at a = 0, 
but which between a = and a = 0.087247 has slope 4.42238 bits (see Fig. EH). Thus, 
for small enough a, the adaptive capacity C\ t A is strictly larger than C\a- 
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Figure 6: This graph contains five curves. The blue line BZ and the green curve AYZ are 
the same as in Fig. [3J The maximum of these two curves is the accessible information for the 
lifted trine states T(a) with equal probabilities for all three states. The maximum of all five 
curves is the conjectured C\ t \ capacity. The three red curves with left endpoint C are the 
three successively better guesses described in the text for the C\ t \ capacity. The lower red 
curve CW is the Gi t \ capacity for just two of the lifted trine states. The middle red curve 
CX is the capacity obtained using the same measurement Q(4a/ (3a + 1)) that gives the lower 
red curve CW, but with the probabilities of the three trine states optimized. Finally, the 
top red curve CY is the C% t i capacity obtained by optimizing both the probabilities and the 
measurement, but only over the limited class of measurements Q(b). These three red curves 
become exponentially close to each other as a approaches 0. 
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Figure 7: This figure shows the Shannon capacity obtained using various measurements on 
the trine states T(a), while optimizing the input probability distribution on the trines. The 
green curve AYZ is the same as in the previous figures. It is obtained using the von Neumann 
measurement Q(2/3), which is also the measurement V(0). The violet curve CV is obtained 
using the measurement Q(0), which is optimal for the planar trines (a = 0). The dashed 
yellow curves are the capacities obtained by the measurement Q(sin 2 9) where 9 is taken at 
intervals of 5° from 5° to 50°. The violet curve CV corresponds to 9 — 0° and the green curve 
AYZ to 9 = 54.736°. Finally, the red curve CY is the upper envelope of the dashed yellow 
curves; shows the capacity obtained by choosing the measurement Q(P) with optimal j3 for 
each a < 0.040491. For larger a, this optimum is at ft = 2/3, and is given by the green curve 
YZ. 
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4 The Adaptive Capacity Ci ^ 



As can be seen from Figure H3 the C\ \ capacity is not concave in a. That is, there are 
two values of a such that the average of their C\a capacities is larger than the C\a 
capacity of their average. This is analogous to the situation we found while studying the 
accessible information for the probability distribution (|, |, |), where the curve giving 
the information attainable by von Neumann measurements was also not concave. In 
that case, we were able to obtain the convex hull of this curve by using a POVM to 
linearly interpolate between the two von Neumann measurements. Remarkably, we 
show that for the lifted trines example, the relationship between C\ \ capacity and 
C\ y A capacity is similar: protocols using adaptive measurement can attain the convex 
hull of the Ci t i capacity with respect to a. 

We now introduce the adaptive measurement model leading to the Ci^a capacity. If 
we assume that each of the signals that Bob receives is held by a separate party, this is 
the same as the LOCC model used in 011] where several parties share a quantum state 
and are allowed to use local quantum operations and classical communication between 
the parties. In our model, Alice sends Bob a tensor product codeword using the 
channel many times. We call the output from a single use of the channel a signal. Bob 
is not allowed to make joint quantum measurements on more than one signal, but he is 
allowed to make measurements sequentially on different signals. He is further allowed 
to use the classical outcomes of his measurements to determine which signal to measure 
next, and to determine which measurement to make on that signal. In particular, he 
is allowed to make a measurement which only partially reduces the quantum state of 
one signal, make intervening measurements on other signals, and return to make a 
further measurement on the reduced state of the original signal (which measurement 
may depend on the outcomes of intervening measurements). The information rate 
for a given encoding and measurement strategy is the mutual information between 
Alice's codewords and Bob's measurement outcomes, divided by the number of signals 
(channel uses) in the codeword. The adaptive one-shot capacity C\ t A is defined to be 
the supremum of this information rate over all encodings and all measurement strategies 
that use quantum operations local to the separate signals (and classical computation to 
coordinate them). As we will show in Sectional to exceed C\ \ it is crucial to be able to 
refine a measurement made on a given signal after making intervening measurements 
on other signals. 

In our adaptive protocol for lifted trines, we use two rounds of measurements. We 
first make one measurement on each of the signals received; this measurement only 
partially reduces the quantum state of some of the signals. We then make a second 
measurement (on some of the signals) which depends on the outcomes of the first round 
of measurements. 

A precursor to this type of adaptive strategy appeared in an influential paper of 
Peres and Wootters J3] which was a source of inspiration for this paper. In their 
paper, Peres and Wootters studied strategies of adaptive measurement on the tensor 
product of two trine states, in which joint measurements on both copies were not 
allowed. They showed that for a specific encoding of block length 2, adaptive strategies 
could extract strictly more information than sequential strategies, but not as much as 
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was extractable through joint measurements. However, the adaptive strategies they 
considered extracted less information than the C\ \ capacity of the trine states, and so 
could have been improved on by using a different encoding and a sequential strategy. 
We show that for some values of a, Ci ; a is strictly greater than C\ t \ for the lifted 
trines T(a), where these capacities are defined using arbitrarily large block lengths 
and arbitrary encodings. It is open whether C\ \ = C\^a for the planar trine states. 

Before we describe our measurement strategy, we will describe the codewords we use 
for information transmission. The reason we choose these codewords will not become 
clear until after we have given the strategy. Alice will send one of these codewords to 
Bob, who with high probability will be able to deduce which codeword was sent from 
the outcomes of his measurements. These codewords are constructed using a two-stage 
scheme, corresponding to the two rounds of our measurement protocol. Effectively, we 
are applying Shannon's classical channel coding theorem twice. 

To construct a codeword, we take two error correcting codes each of block length 
n and add them letterwise (mod 3). The first code is over a trinary alphabet (which 
we take to be {0, 1,2}); it contains 2 Tin ~°( ra ) codewords, and is a good classical error- 
correcting code for a classical channel we describe later. The second code contains 
2 r 2 n- (n) codewords, is over the binary alphabet {0, 1}, and is a good classical error- 
correcting code for a different classical channel. Such classical error-correcting codes 
can be constructed by taking the appropriate number of random codewords; for the 
proof that our decoding strategy works, we will assume that the codes were indeed 
constructed this way. To obtain our code, we simply add these two codes bit-wise 
(mod 3). For example, if a codeword in the first (trinary) code is 0212 and a codeword 
in the second (binary) code is 1110, the codeword obtained by adding them bitwise is 
1022. This new code contains 2 <T1+T2 ) ra ~ o(n ) codewords (since we choose the two codes 
randomly, and we make sure that n + r 2 < log 2 3) . 

To show how our construction works, we first consider the following measurement 
strategy. This is not the best protocol we have found, but it provides a good illustration 
of how our protocols work. This uses the two-level codeword scheme described above. 
In this protocol, the first measurement we make uses a POVM which contains four 
elements. One of them is a scalar times the matrix 



which projects the lifted trine states onto the planar trine states. The other three 
elements correspond to the three vectors which are each perpendicular to two of the 
trine states; these vectors are 
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Figure 8: The classical channel corresponding to the first-stage code in our first adaptive 
protocol. The solid lines indicate a probability of 3a/2 for the transition; dashed lines a 
probability of 1 — 3a. For example, a symbol in the first stage code is first encoded with 
probability ^ each by trines To an d T\. Considering the effects of the first measurement, a 
symbol in the first stage code is equally likely (probability 3a/ 2) to be taken to measurement 
outcomes D and D\, and is taken to outcome U xy (which for this channel is essentially an 
erasure) with probability 1 — 3a. 



Note that {D^ |Tf, 2 ) = if and only if b± ^ hi 
We now scale J\ xy and Di so as to make 
elements to sum to the identity, i.e., that J2i=o -^l^-i = I- This is done by choosing 



We now scale H xy and Di so as to make a valid POVM. For this, we the POVM 



A = IAXAI, i = 0,1,2 

V3(l - a) 
y/1 — 3a 

When we apply this POVM, the state | v) is taken to the state A; L \ v) with probability 
(v | A\Ai I v). When this operator is applied to a trine state Tf,(a), the chance of 
obtaining is 3a, and the chance of obtaining U xy is 1 — 3a. If the outcome is Df>, we 
know we started with the trine Tf,, since the other two trines are perpendicular to D\,. 
If we obtain the fourth outcome, U xy , then we gain no information about which of the 
three trine states we started with, since all three states are equally likely to produce 

We now consider how this measurement combines with our two-stage coding scheme 
introduced above. We first show that with high probability we can decode our first 
code correctly. We then show that if we apply a further measurement to each of the 
signals which had the outcome IL xy , with high probability we can decode our second 
code correctly. 

In our first measurement, for each outcome of the type Df, obtained, we know that 
the trine sent was However, this does not uniquely identify the letter a in the 
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• *■ a+1 (mod3) 



Figure 9: The classical channel corresponding to the second-stage code in our first adaptive 
protocol. The solid lines indicate a probability of 0.64542(1 — 3a). The dotted lines indicate a 
probability of 0.35458(1 — 3a). The dashed lines indicate a probability of 3a. Note that this 
channel is symmetric if the inputs are interchanged; this means that the probability distribution 
(5,5) maximizes the information transmission rate. 

corresponding position of the first-stage code, as the trine T), sent was obtained by 
adding either or 1 to a (mod 3) to obtain b. Thus, if we obtained the outcome D}>, 
the corresponding symbol of our first-stage codeword is either b or b — 1 (mod 3), and 
because the second code is a random code, these two cases occur with equal probability. 
This is illustrated in Figure EI if the codeword for the first stage code is a, then the 
outcome of the first measurement will be D a with probability 3a/2, D a+ i ( mod 3 ) with 
probability 3a/2, and H xy with probability 1 — 3a. This is a classical channel with 
capacity 3a(log 2 3 — 1): with probability 3a, we obtain an outcome D x for some x, and 
in this case we get (log 2 3—1) bits of information about the first-stage codeword; with 
probability 1 — 3a, we obtain TL xy , which gives us no information about this codeword. 
By Shannon's classical channel coding theorem, we can now take t\ = 3a(log 2 3 — 1), 
and choose a first-stage code with 2 Tin ~°( n ) codewords that is an error-correcting code 
for the classical channel shown in Fig. |HJ Note that in this calculation, we are using the 
fact that the second-stage code is a random code, to say that measurement outcomes 
D a and D a+1 (mod 3) are equally likely. 

Once we have decoded the first stage code, the uncertainty about which codeword 
we sent is reduced to decoding the second stage code. Because the second code is 
binary, the decoding of the first-stage code leaves in each position only two possible 
trines consistent with this decoding. This means that in the approximately (1 — 3a)n 
positions where the trines are projected into the plane, we now need only distinguish 
between two of the three possible trines. In these positions, we can use the optimal 
measurement for distinguishing between two planar trine states; recall that this is a von 
Neumann measurement which gives 1 - #(1/2 + V3/4) = 0.64542 bits of information 
per position. In the approximately 3an remaining positions, we still know which 
outcome Df, was obtained in the first round of measurements, and this outcome tells 
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us exactly which trine was sent. Decoding the first-stage code left us with two equally 
likely possibilities in the second-stage code for each of these positions. We thus obtain 
one bit of information about the second-stage code for each of these approximately 3a 
positions. Thus, if we set t 2 = 0.64542(1 — 3a) + 3a bits, by Shannon's theorem there is 
a classical error-correcting code which can be used for the second stage and which can 
almost certainly be decoded uniquely. Adding t\ and t 2 , we obtain a channel capacity 
of 0.64542(1 — 3a) + (log 2 3)(3a) bits; this is the line which interpolates between the 
points a = and a = 1/3 on the curve for C\\. As can be seen from Figure ITT1 for 
small a this strategy indeed does better than our best protocol for C\ t \. 

The above strategy can be viewed in a slightly different way, this time as a three- 
step process. In the first step our measurement either lifts the three trine states up 
until they are all orthogonal, or projects them into the plane. This first step lifts 
approximately 3an trines up and projects approximately (1 — 3a)n trines into the 
plane. After this first step, we first measure the trines that are lifted further out of the 
plane, yielding log 2 3 bits of information for each of these approximately 3an positions. 
The rest of the strategy then proceeds exactly as above. Note that this reinterpretation 
of the strategy is reminiscent of the two-stage description of the six-outcome POVM 
for accessible information in Section El 

We now modify the above protocol to give the best protocol we currently know 
for the adaptive capacity C± t A- We first make a measurement which either projects 
the trines T(a) to the planar trines T(0) or lifts them out of the plane to some fixed 
height, yielding the trines T(^f 2 ); this measurement requires a < 72. (Our first strategy 
is obtained by setting 72 = |.) We choose 72 = 0.087247; this is the point where the 
convex hull of the curve representing the C\ : \ capacity meets this curve (see Figure fTU|) : 
at this point C\ \ is 1.03126 bits. It is easy to verify that with probability 1 — a/72, 
the lifted trine T b (a) is projected onto a planar trine T&(0), and with probability 
a/72, it is lifted up to 2], (72). We next use the optimum von Neumann measurement 
V(0) = Q(2/3) on the trine states that were lifted out of the plane. 

We now analyze this protocol in more detail. Let 

p = (V b (0)\T b ( l2 )) 2 = 0.90364. (19) 
The first-stage code gives an information gain of 

log 2 3 - H(±±P; i±^; i^P) = 0.35453 bits 

for each of the approximately {a/~f 2 )n signals which were lifted out of the plane. This 
is because the symbol a in the first-stage code is first taken to T a or T a+1 ( mod 3 \ with 
a probability of ^ each (depending on the value of the corresponding letter in the 
second-stage code). We thus obtain each of the two outcomes V a , V a+ i ( mod 3 ) with 
probabilities \p + ^(1 — p)/2 = \{1 + p), and obtain the outcome V a+2 ( mo d 3) with 
probability ^(1 — p). Thus, if we start with the symbol a in the first-stage code, the 
entropy of the outcome of the measurement (i.e., H(X ont \Xi n )) is H(-^; i^ 2 ; ; ^p); it 
is easy to see that H(X out ) is log 2 3. From classical Shannon theory, we find that we 
can take 17 = 0.35453(a/72). 
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The design of our codewords ensures that knowledge of the first codeword eliminates 
one of the three signal states for each of the trines projected into the plane. This allows 
us to use the optimal C\ \ measurement on the planar trines, resulting in an information 
gain of 0.64542 bits for each of these approximately (1 — signals. We obtain 



5(1 +P) 



1 - H 



1-P 
l+P 



0.67673 bits 



for each of the approximately a/72 signals that were lifted out of the plane; this will 
be explained in more detail later. Put together, this results in a capacity of 

C 1>A = 0.64542(1 - a/72) + 1.03126(a/7 2 ) (20) 

bits per signal; this formula linearly interpolates between the C\ \ capacity for T(0) 
and the C^i capacity for T(7 2 ). 

Why do we get the weighted average of Ci t \ for T(0) and C^i for Tfa) as the 
C\ t \ for T(a), < a < 72? This happens because we use all the information that was 
extracted by both of the measurements. The measurements on the trines that were 
projected onto the plane only give information about the second code, and provide 
0.64542 bits of information per trine. For the trines TX72) that were lifted out of the 
plane, we use part of the information extracted by their measurements to decode the 
first code, and part to decode the second code. For these trines, in the first step, we 
start with the symbols {0,1,2} of the first stage code with equal probabilities. A 
symbol gives measurement outcome Vo with probability Vi with probability 
and V2 with probability —f-, and similarly for the other signals. The information gain 
from this step is thus 

log 2 3 - H (^±2 ; ^±2 ; V) = 0.35453 bits (21) 

per signal. At the start of the second step, we have narrowed the possible states for 
each signal down to two equally likely possibilities. This information gain for this step 
comes from the case where the outcome of the measurement was and where one of 
the two possible states (consistent with the first-stage code) is T^. This happens with 
probability ^(1+p). In this case, we obtain a binary symmetric channel with crossover 
probability 2p/(l+p). In the other case, where the measurement was V& and neither of 
the two possible states consistent with the first-stage code is b, we gain no additional 
information, since both possible states remain equally likely. The information gain 
from the second step is thus 



|(l+p) 



1 - H 



2p 



0.67673 bits (22) 



per signal. Adding the information gains from the first two stages [Eqs. (|21j) and (|22jl] 
together gives 

log 2 3-#(p; V; ¥)> (23) 

which is the full information gain from the measurement on the trine states that were 
lifted further out of the plane; that this happens is, in some sense, an application of 
the chain rule for classical entropy |2j. 
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Figure 10: The three red curves AZ, BZ and CZ are all shown in Fig. HO their maximum is 
the best value known for Ci,i capacity. The blue line CZ is straight; it is the second adaptive 
strategy discussed in this section, and is the largest value we know how to obtain for C\,a- 
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Figure 11: All the curves in Figure lTUl are shown, for < a < |, along with the purple line CN 
and the brown curve DN. The maximum of the three red curves is the best value for the C\,\ 
capacity we have found. The line CN is the capacity of the first adaptive strategy discussed in 
in this section. The line CZ is the capacity of the second adaptive strategy. The curve DN is 
the 61,00 capacity (i.e., the Holevo bound). 

5 The upper bound on 

In this section, we show that for the lifted trines T(a), if a is small, then the C\ t A 
capacity is exponentially close to the accessible information obtainable using just two 
of our trines, showing that the three red curves in Fig. are exponentially close when 
a is close to 0. 

First, we need to prove that in a classical channel, changing the transition proba- 
bilities by e can only change the Shannon information by 0{— elog 2 e). The Shannon 
capacity of a classical channel with input distribution pi and transition probabilities qij 
is the entropy of the output less the entropy of the output given the input, or 

iVout-l Mn-l \ JV in -l 2V in -l AUt-i 

J s = - Y Y log 2 Y PiQij+ Yl Y /','/.., iog 2 '/,,. (24) 

j=0 \ i=Q J i=0 i=0 j=0 
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Suppose we change all the qij by at most e. I claim that the above expression changes 
by at most — 2A r out e log 2 e. Each of the terms qijlog 2 qij in the second term changes 
by at most — elog 2 e, and adding these changes (with weights pi) gives a total change 
of at most — iV out elog 2 e. Similarly, each of the terms J2iPiQij i n the first term of (|2*1|) 
changes by at most e, and there are at most N ont of them, so we see easily that the 
first term also contributes at most — iV ou telog 2 e to the change. For -/V ou t < 6, which 
by the real version of Davies' theorem is sufficient for the optimum measurement on 
the lifted trines, we have that the change is at most — 12elog 2 e. 

Next, we need to know that the C\^\ capacity for the planar trines is maximized 
using the probability distribution (0, A, ^), One can easily convince oneself of this by 
inspecting Figure 1141 We will discuss this at more length in Appendix A, where we 
sketch a proof that the point (0, ^, ^) is a local maximum for the accessible information. 
The proof also shows that to achieve a capacity close to C\ t i one must use a probability 
distribution and measurement close to those achieving the optimal, a fact we will be 
using in this section. 

Now, we can deal with lifted trines. We will consider the trine states T(a) for 
small a. By moving each of the trines T(a) by an angle of <p = arcsin ^/a < 2-^/a, we can 
obtain the planar trines T(0). We now have that the difference between the transition 
probabilities for T{a) and T(0) for any rank 1 element in a POVM is at most 4>, since 
the transition probability for a fixed POVM element r \ v){v | is a constant multiple 
(with the constant being r < 1) of the square of the cosine of the angle between the 
vectors | v) and | T&), and this angle changes by at most 4>. Thus, by the lemma above, 
the C\ i capacity for the lifted trines T(a) differs by no more than 5 = — 1201og 2 </> 
from the capacity obtained when the same probabilities (and measurement) are used 
for the planar trine states T(0). 

For the lifted trine states T(a), we know (from Section |2J) that the Ci 5 i capacity 
C\ t \(a) is greater than Ci^O), the capacity for the planar trine states. If we apply to 
the planar trine states T(0) the same measurements and the same probability distri- 
bution that give the optimum C\ \ capacity for the lifted trine states T(a), we know 
that we have changed the capacity by at most 5 = — 12</>log 2 </>. We thus have that 
Ci,i(0) < Ci,i( a ) < Ci,i(0) + 5, and that the measurements and probability distri- 
bution that yield the optimum capacity Ci 5 i(a) for the lifted trines T{a) must give 
a capacity of at most Ci^O) — 5 when applied to the planar trines. This limits the 
probability distribution and measurement giving the optimum capacity C\ \ for the 
lifted trines. For sufficiently small a, the optimum probability distribution on the 
trines must be close to (0, |, |), and when the optimum projectors are projected onto 
the plane, nearly all the mass must be contained in projectors within a small angle of 
the optimum projectors for Ci^O), namely 4^(1, ±1). 

We now consider the derivative in the information capacity obtained when the 
measurement is held fixed, and po is increased at a rate of 1 while p\ and p 2 are 
decreased, each at a rate of 1/2. Taking the derivative of (|24|). we obtain 

5/2 \ 2 25 

!'s = ~ X) ( Pi^i lo S2 J2 W + E £ Pilv lo S2 Qij > ( 25 ) 

j=0 \i=0 / i=0 i=0 j=0 

where p' Q = 1 and p'-y = p' 2 = —1/2. This derivative can be broken into terms associated 
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with each of the projectors in the measurement. Namely, if the jth POVM element is 
rj | v){v | , then the associated term is 

/ 2 \ 2 2 

-Tj ^p'iqiv I log 2 ^Piqiv + rj ^2v'iQiv log 2 qiv, 

\i=0 I i=0 i=0 

where q{ v = \(Ti\v)\ 2 , p' = 1 and p[ = p' 2 = —1/2. The rj can be factored out, and 
this term can be written as rjl' v where we define 

/ 2 \ 2 2 

I'v = - ( P'iQiv I log 2 Pi9iv + PiliD lo S2 Qiv ■ (26) 
\i=0 / i=0 i=0 

It is easy to see that for the planar trines, a projector v$ = (cos 9, sin 9) with 9 suffi- 
ciently close to ±7r/4, and a probability distribution sufficiently close to (0, ^, \), the 
term I 1 is approximately —0.3227 bits, the value of I' at 9 = ±7r/4 and the probabil- 
ity distribution (0, |, ^). Similarly, for the planar trines, if the probability distribution 
is close to (0, ^, ^), I' Vg can never be much greater than 2 bits, which is the maximum 
value for the probability distribution (0, |, i) (occurring at 9 = 0). These facts show 
that if q is sufficiently small, then the formula (|25|) for the derivative of the accessible 
information is negative and bounded above by (say) —0.64 bits when the planar trines 
are measured with the optimum POVM for Ci^(a). This is true because J2j r j = 2, 
and all but a 1 — e fraction of the mass must be in projectors vg for 9 near ±7r/4; 
each of these projectors will contribute at least 0.321rj (say) to the derivative, and 
the projectors with 9 not near ±7r/4 cannot change this result by more than 4e. For 
the optimum measurement and probability distribution for C\ \{pi) to have a non-zero 
value of po (from Section |31 we know it does), this negative derivative must be balanced 
by a positive derivative acquired by some projectors when the trines are lifted out of 
the plane. We will show that this can only happen when po is exponentially small in 
1/5; for larger values of po, the positive component acquired when the trines are lifted 
out of the plane is dwarfed by the negative component retained from the planar trines. 

We have shown that I' s < —0.64 bits near the probability distribution (0, ^, ^) when 
the optimal measurement for T(a) is applied to the planar trines, assuming sufficiently 
small a. We also know from the concavity of the mutual information that for the lifted 
trines T(a) with a > 0, the derivative I' s is positive for any probability distribution 
(po — t,p\ +t/2.p2 + t/2) where (po,pi,P2) is the optimal probability for C\ } \ capacity 
and < t < po. Thus, we know that the negative derivative for the planar trines must 
be balanced by a positive derivative acquired by some projectors when you consider 
the difference between the planar trines and the lifted trines. We will show that this 
can only happen when the probability pq is exponentially small in the lifting angle (p. 
This shows that at the probability distribution achieving C\ \, po is exponentially small 
in l/</> = 1/ arcsin ^/a. 

Consider the change in the derivative I' v for a given projector Vj when the trines 
T(0) are lifted out of the plane to become the trines T(a). To make I' v . positive, 
this change must be at least 0.64 bits. Let the transition probabilities with the op- 
timal measurement for C\^{a) be rjqi Vj and the transition probabilities for the same 
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measurement applied to the planar trines be rj<ji Vj . Since the constant factors rj mul- 
tiplying the projectors sum to 3, we have that the value I' v for one projector v must 
change by at least 0.21 bits, that is, 



^PiQiv log 2 53 Pi&, 



\i=0 



i=0 



J2miv fog 2 53 

PiQiv 

\i=0 / i=0 

2 

+ 53 Pi (liv log 2 qiv ~ Qiv log 2 q%: 



> 0.21 



i=0 

where qi v = \ (Ti(0)\v)\ 2 and q{ v = \(Ti(a)\v}\ 2 , as before. We know \% v — q$ v \ < <f>. 
We will first consider the last term of (|27jl. 

2 5 

53 53 Pifa™ lo S2 Qiv - qiv log 2 ?i, 
i=0 j=0 

This is easily seen to be bounded by — 60 log 2 <fr, which approaches as a approaches 
0. 

Next, consider the first terms of l)27|) . 



'2 \ 2 / 2 \ 2 

53?i9w ) fog 2 53^9™ - 53^™ ) lo §2 53^* 

\i=0 / i=0 \i=0 / i=0 



(27) 



Bounding this is a little more complicated. First, we will derive a relation among the 
values of qi v for different i. We use the fact that for the planar trines 

|T (0)) = -|T 1 (0))-|T 2 (0)). 

Taking the inner product with (vj | , we get 

(v j \T (0)) = -(v j \T 1 (0))-(v j \T 2 (0)). 

And now, using the fact that % v = \ (vj\Ti(0)}\ 2 , we see that 

%v < 2(qiv + hv)- (28) 

Using ((2*5)) , and the fact that p\ , p 2 are close to ^ , we have that 



3, 3, 



1 



> -(qov + qiv + q2v) 



(29) 



for sufficiently small a. We also need a relation among the qi v . We have 

2 

<?o« + giD + q2v = (v | (53 1 T *X T * I ) I w ) 

i=0 

1 9 



(30) 
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where the second step follows because the minimum eigenvalue of | Tq)(Tq | + | T 2 )(T 2 | + 
I T 2 )(T 2 I is a > (t?/A. 

We now are ready to bound the formula ()27l) . We break it into two pieces; if this 
expression is at least 0.2 bits, then one of these two pieces must be at least 0.1 bits. 
The two pieces are as follows: 



and 



/ 2 \ 2 

J^PiiHv ~ Qiv) log 2 ^ 

PiQi 

\i=0 / i=0 

' 2 \ / 2 2 \ 

Y P'&v ) I lo S2 Y PiQiv - log 2 Y Pi&v I 
\i=0 / \ i=Q i=0 / 



(31) 



(32) 



We first consider the case of (|31[) , Assume that it is larger than 0.1 bits. Then 



PiQi 

\i=0 / i=0 



\i=0 I i=0 



< -20 lo^ 



po<t> 



(33) 



(34) 



where the first step follows from the facts that | qi v — cji v \ < 4> and that po is the smallest 
of the pi, and the second step follows from (|3U|) . Thus, if the quantity (|3*Tj) is at at 
least 0.1, we have that 



po 



< 4 Q" - 05 /^ 



showing that po is exponentially small in 1/0. 
We next consider the case ()32|) . Assume that 



t 2 \ / 2 2 \ 

Y PiQiv I log 2 Y PiQiv ~ log 2 YPiViv I 
\i=0 / V i=0 i=0 / 



is larger than 0.1 bits. We know that 

2 

YPi& 

i=0 



<EKI = 2 

i=0 



Thus, for (|32j) to be larger than 0.1, we must have that 

Ya=oPiQv 



log 



2 ^2 

2^j=0 PiQiv 



> 0.05. 



We know that the numerator and denominator inside the logarithm differ by at most 0. 
It is easy to check that if | log 2 (x/y)| > 0.05, and x — y < 0, then both x and y are at 
most 150. Thus, 



YPiQiv < 15( A- 

i=0 



(35) 
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Further, 



(36) 



i=0 



i=0 
2 

< 8^p;<?;„ 

< 1200. 

where the second inequality follows by (|29|) and the third by ()35j) . 

Since the two terms in (|3*2*)) are of opposite signs, if they add up to at least 0.1 
bits, at least one of them must exceed 0.1 bits by itself. We will treat these two cases 
separately. First, assume that the first term exceeds 0.1. Then 



0.1 < 



< 



< 



i=0 
2 

i=0 

-120</>log 2 



\0g2^2 PiQiv 
i=0 
2 

l Og 2 ^2poQi% 
i=0 

po4> 2 



where the last inequality follows from IjMOjl and IjUfiJI . If this is at least 0.1, then we 
again have that po is exponentially small in l/4>. 
Finally, we consider the case of the term 



2 
i=0 



i=0 



We have by (J2HJ) that 



i=0 



2 / 2 \ 1 2 

log 2 Vrtiv < - I 22 Qiv ] log 2 -xZ^Qiv, 

8=0 \i=0 I i=0 



which, since J2i=oQiv < 1200, can never exceed 0.1 for small <fi, as it is of the form 
— 8xlogx for a small x. 

Since po is exponentially small in 1/0, we have that the difference between the C\a 
capacity using only two trines and that using all three trines is exponentially small 
in -7=, showing that our guess in Section |3] are exponentially close to the correct C\a 
capacity as a goes to 0, and thus showing that C± : a is strictly larger than C\a in a 
region near a = 0. 



6 C11 = Ci } a for two pure states 

In this section, we prove that for two pure states, Cia = C\^a- We do this by giving a 
general upper bound on C\ a based on a tree construction, We then use the fact that 
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for two pure states, accessible information is concave in the probabilities of the states 
(proved in Appendix B) to show that this upper bound is equal to C\ t \ for ensembles 
containing only two pure states. 

For the upper bound, we consider a class of trees, with an ensemble of states 
associated with each node. The action of Bob's measurement protocol on a specific 
signal will generate such a tree, and analyzing this tree will bound the amount of 
information Bob can on average extract from that signal. Associated with each tree 
will be a capacity, and the supremum over all trees will give an upper bound for C\ t A- 

We now describe our tree construction in general. Let us suppose that Alice can 
convey to Bob one of m possible signal states. Let these states be pi, where 1 < i < m. 
To each tree node we assign m density matrices and m associated probabilities (these 
will not be normalized, and so may sum to less than 1). For node x of the tree, we 

1 /2 1 /2 

associate some POVM element E x , and the m density matrices E x p%E x /Tr E x pi, 
where pi are the original signal states. (We may omit the normalization factor of 
Tr E x pi in this discussion when it is clear from context.) For the root node r, the 
POVM element E r is the identity matrix /, and the probability p v ^ is the probability 
that this signal is pt. A probability p x can be associated with node x by summing 
p x = J2i?=iPx,i- For the root, p r = 1. For any node x, its associated probability p x will 
be equal to the sum of the probabilities p y . associated with its children y^. There are 
two classes of nodes, distinguished by the means of obtaining its children from the node. 
The first class we call measurement nodes and the second we call probability refinement 
(or refinement) nodes. For a measurement node x, we assign to each of the children 
yj a POVM elements E yj , where £)■ E yj = E x . The density matrices associated 

1 /2 1 /2 

with a child of x will be E yj PiE yj /Tr E yj pi, and the probability associated with the 

density matrix E y /2 PiE y /2 /Tr E yj pi will be p yjji = p Xii Tr (E yj pi)/Tr (E x pi). Finally, 
we define the information gain associated with a node x. This is for nodes which are 
not measurement nodes, and 

where H({qi}) is the Shannon information J2iH^ S2H °f the probability distribution 
{«}■ 

We now explain why we chose this formula. We consider applying a measure- 
ment to the ensemble associated with node x. This ensemble contains the state 

1 /2 1 /2 

E x p%E x /Tr (piE x ) with probability p x> i/p x . Let us apply the measurement that 
takes p to A^pAt with probability Tr A^A^pi, where J2k ^l^fc = I- Each child y& of x 

1/2 f 1/2 

is associated with one of the matrices A^. Let E yk = E x A^ k AkE x . Then J2k E yk = 
E x . Now, after we apply A^ to E x 2 piE x 2 , we obtain the state A^E x piE x A\. This 
happens with probability 

Tr A k E x /2 Pi E x /2 A[ = Tr E ykPl 
Tr E xPi Tr E xPi ' 

The state we obtain, A k E x 2 piE x 2 A^ k , is unitarily equivalent to E^J 2 PiE\J 2 , so this 
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latter state can be obtained by an equivalent measurement. The information 7 X as- 
sociated with the node x is the probability of reaching the node times the Shannon 
information gained by this measurement if the node is reached. Summing / x over all 
the nodes x of the tree gives the expected information gain by measurement steps. 

The second class of nodes are probability refinement nodes (which we often shorten 
to refinement nodes). Here, for all the children {yfc} of x, E x = E yk . We assign 
probabilities p yk ,{ to the children y& so that ^2kPyt,i = Px,i- For this class of nodes, 
we define I x to be 0. These nodes correspond to steps in the protocol where additional 
information is gained about one signal state in the codeword by measuring different 
signal states in the codeword. 

To find the upper bound on the C\ \ capacity for a set of states {pi}, we take the 
supremum over the information gain associated with all trees of the above form. That 
is, we try to maximize J2x over all probability distributions p r ,i on the root node r, 
all ways of splitting E x = J2k Ey k for measurement nodes x, and all ways of splitting 
probabilities p X; j = J2kPyk,i f° r refinement nodes x and signal states i. (And if this 
maximum is not attained, we take the supremum.) 

To prove this upper bound, we track the information obtained from a single signal 
(i.e., channel output) S u in the protocol used by Alice and Bob. We assume that Alice 
sends Bob a set of states, and Bob performs measurements on them one at a time. 
We keep track at all times t (i.e., for all nodes x of the tree) of the probability that 

1/2 1/2 

signal S u is in state E x piE x . There are two cases, depending on which signal Bob 
measures. In the first case, when Bob measures signal S v , we perform a measurement 
on the current tree node x that splits each of the possible values of p^j for this signal 
S v into several different values. This case corresponds to a measurement node of the 
tree. We can assume without loss of generality that for his measurement Bob uses the 
canonical type of operators discussed above, so that E x 2 ' piE x 2 goes to E yk piE yk 
with probability Tr E yk pi /Tr E x p; L . Thus, we now have several different ensembles of 

1/2 1/2 

density matrices, the kth of which contains E y ' k p%E yk /Tr E yk pi with (unnormalized) 
probability p X) jTr E yk pi/Tr E x p; L = Py k ,i- In this step Bob can extract some infor- 
mation about the original codeword, and the amount of this information is at most 

The other case comes when Bob measures signals than S u . These steps can provide 
additional information about the signal S u , so if the probability distribution before this 

1/2 1/2 

step contained E x piE x with probability p X) j, we now have several distributions, each 

1 /2 1/2 

assigned to a child of x; the jth distribution contains E x piE x with (unnormalized) 
probability p yj: i Here, we must have J2jPy 3 ,i = Px,i- This kind of step corresponds to a 
probability refinement node in the tree. The information gained by these measurement 
steps can be attributed to the signals that are actually measured in these steps, so we 
need not attach any information gain to the refinement steps in the tree formulation. 
Averaging the information gain over the trees associated with all the signals gives the 
capacity of the protocol, which is the expected information gain per signal sent. 

There are several simplifying assumptions we can make about the trees. First, we 
can assume that nodes just above leaves are measurement nodes that contain only rank 
1 projectors, since any refinement node having no measurement nodes below it can be 
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eliminated without reducing the information content of the tree, and since the last 
measurement might as well extract as much information as possible. We could assume 
that the types of the nodes are alternating, since two nodes of the same type, one a child 
of the other, can be collapsed into one node. In the sequel, we will perform this collapse 
on the measurement nodes, so we assume that all the children of measurement nodes 
are refinement nodes. We could also (but not simultaneously) assume that every node 
has degree two, since any measurement with more than two outcomes can be replaced 
with an equivalent sequence of measurements, each having only two outcomes, and any 
split in probabilities can be replaced by an equivalent sequence of splits. In the sequel 
we will assume that all the probability refinement nodes are of degree two. 

One interesting question is whether any tree of this form has an associated protocol. 
The upper bound will hold whether or not this is the case, but if there are trees with no 
associated protocols, the bound may not be tight. We do not know the answer to this, 
but suspect that there are trees with no associated protocols. Our (vague) intuition is 
that if the root node is a measurement node with no associated information gain, and 
all of the children of this node are refinement nodes, there appears to be no way to 
obtain the information needed to perform one of these refinement steps without also 
obtaining the information needed to perform the all the other refinement steps of the 
root node. However, making this much information available at the top node would 
reduce the information that could be obtained using later measurement nodes. It is 
possible that this difficulty can be overcome if there is a feedback channel available 
from the receiver to the sender. We thus boldly conjecture 

Conjecture 1 If arbitrary use of a classical feedback channel from the sender to the 
receiver is available for free, then the adaptive capacity with feedback C\,af is given by 
the supremum over all trees of the above type of the information associated with that 
tree. 

As mentioned above, the supremum of the extractable information over all trees is 
an upper bound on Ci t A, since it is at least as large as the information corresponding to 
any possible adaptive protocol. We now restrict our discussion to the case of ensembles 
consisting of two pure states, and prove that in this case we have equality, since both 
of these bounds are equal to the C\a capacity. Consider a tree which gives a good 
information gain for this ensemble (we would say maximum, but have no proof that 
the supremum is obtainable). There must be a deepest refinement node, so all of its 
descendents are measurement nodes. We may without loss of generality assume that 
this deepest refinement node has only two children. Each of these two children has an 
associated ensemble consisting of two pure states with some probabilities. The max- 
imum information tree will clearly assign the optimum measurement to these nodes. 
However, an explicit expression for this optimum measurement is known [HI I1UI fTTl 112] . 
and as is proved in Appendix B, the accessible information for ensembles of two given 
pure states is concave in the probabilities of the states. Thus, if we replace this re- 
finement node with a measurement node, we obtain a tree with a higher associated 
information value. Using induction, we can perform a series of such steps which do not 
decrease the information gain associated with the tree while collapsing everything to 
a single measurement. Thus, for two pure states, we have Cx,i = C\ a- 
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The above argument would work to show that C\ % \ = C\^a for ensembles consist- 
ing of two arbitrary density matrices if we could show that the accessible information 
for two arbitrary density matrices is concave in the probabilities of these two density 
matrices. It would seem intuitively that this should be true, but we have not been 
able to prove it. It may be related to the conjecture ^HH] that the optimal accessible 
information for two arbitrary density matrices can always be achieved by a von Neu- 
mann measurement. This has been proved in two dimensions and is supported by 
numerical studies in higher dimensions .4-. We thus conjecture: 

Conjecture 2 C\ a = Ci 1 for two mixed states in arbitrary dimensions. 

In fact, the proof in this section will work for any upper bound on accessible infor- 
mation which has both the concavity property and the property that if a measurement 
is made on the ensemble, the sum of the information extracted by this measurement 
and the expected upper bound for the resulting ensemble is at most the original upper 
bound. The Fuchs-Caves bound |Hj (which was Holevo's original bound) may have 
these properties; we have done some numerical tests and have not found a counterex- 
ample. For 3 planar trine states with equal probabilities, this gives an upper bound of 
approximately 0.96. 
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Figure 12: The tree corresponding to the best adaptive protocol of Section^ To simplify the 
diagram, we do not give the probabilities and states in the ensembles of the leaves of this tree, 
which are represented here by empty boxes. Since they are reached by the final measurement, 
which projects onto a rank 1 density matrix, the quantum states corresponding to these nodes 
are now completely reduced, and no further information can be extracted from these ensembles. 
The probabilities can be computed from the discussion in Section 
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7 Discussion 



If we force Bob to measure his signals sequentially, so that he must complete his mea- 
surement on signal k before he starts measuring signal k + 1 (even if he can adaptively 
choose the order he measures the signals in and even if a feedback channel is applied 
from Bob to Alice), Bob can never achieve a capacity greater than C%\. This can easily 
be seen. Without decreasing the capacity, we assume that Bob uses a feedback channel 
to send all the information that he has back to Alice. This information consists of the 
results of the measurement and the measurement that he plans to perform next. The 
ensemble of signals that Alice now sends Bob can convey no more information than 
the optimal set of signals for this measurement. However, it now follows from classical 
information theory that such a protocol can never have a capacity greater than the 
sum of the optimal information gains for all these measurements, which is at most C\a- 

It is thus clear that the advantage of adaptive protocols is obtained from the fact 
that Bob can adjust subsequent measurements of a signal depending on the outcome 
of the first round of measurements on the entire codeword. In the information decision 
tree of Section for our protocol (see Figure 112(1 . the crucial fact is that we first 
either project each of the trine states into the plane or lift it up. We then arrange to 
distinguish between only two possible states for those signals that were projected into 
the plane, and among all three possible states for those signals that were lifted. 

As we showed in Section® C\,\ = C\ y A for two pure states, and this proof can be 
extended to apply to two arbitrary states if a very plausible conjecture on the accessible 
information for a two-state ensemble holds. For three states, even in two dimensions, 
the same upper bound proof cannot apply. However, for three states in two dimensions, 
it may still be that C\a = C\ t A- We have unsuccessfully tried to find strategies that 
perform better than the C\\ capacity for the three planar trine states, and we now 
suspect that the adaptive capacity is the same as the C\ \ capacity in this case, and 
that this is also the case for arbitrary sets of pure states in two dimensions. 

Conjecture 3 For an arbitrary set of pure states in two dimensions, C\ \ = C\a> and 
in fact, this capacity is achievable by using as signal states the two pure states in the 
ensemble with inner product closest to 0. 

For general situations, we know very little about C\ a. In fact, we have no good 
criterion for deciding whether C\a ls strictly greater than G\ \. Another question 
is whether entangled inputs could improve the adaptive capacity. That is, whether 
Cl,A = Coo.Ai where Coo, a is the capacity given entangled inputs and single-signal, but 
adaptive, measurements. 
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Appendix A: The C\,\ capacity for the planar trines 



Next, we discuss the C^i capacity for trines in the plane. For Section |3J we needed 
to show two things. First, that C\ t \ for the planar trines was maximized using the 
probability distribution 1T2 = (0, 5, 5), and second, that any protocol with capacity 
close to C\ \ must use nearly the same probability distribution and measurement as the 
optimum protocol achieving C\ %. We show that in the neighborhood of the probability 
distribution II2 , the optimum measurement for accessible information contains only two 
projectors, From this proof, both facts can be easily deduced; we provide a proof of 
the first, the second follows easily from an examination of our proof. 

We first show that if an optimum measurement for accessible information has only k 
projectors, then at most k different input states are needed to achieve optimality. This 
result is a know classical result; for completeness, I provide a brief proof. Shannon's 
formula for the capacity of a classical channel is the entropy of the average output 
less the average entropy of the output. It follows that the number of input states of 
a classical channel needed to achieve optimality never exceeds the number of output 
states. If there are k outcomes, and k' > k input states, then the output probability 
distribution can be held fixed on a (A/— /c)-dimensional subspace of the input probability 
distributions. By the linearity of the average entropy, the minimum average entropy 
can be achieved at a point of that subspace which has only k non-zero probabilities on 
the input states. Thus, if the optimal measurement is a von Neumann measurement, 
only two trines are required to achieve optimality. 

We associate to each projector vg = (cos 6, sin 6) an information quantity depending 
on the probability distribution II = (po,Pi,P2), namely 

/ 2 \ 2 2 

\i=0 / i=0 i=0 

where qi : g = |(Tj|fe)| 2 . The accessible information for a measurement using POVM 
elements rj vq\(vq. is J2j r jln{9j)- Now, we need to find the projectors that form 
a POVM, and maximize the accessible information. If we have projectors vg. with 
associated weights rj, the constraints that the projectors form a POVM are: 

^rjcos 2 ^ = 1 (37) 

i 

J2 r i^n 2 9i = 1 (38) 

i 

rj sin 6i cos Qi = 0. (39) 



These constraints are equivalent to 



Eft = 2 (40) 

i 

X>icos20; = (41) 

i 

Y,Pism29i = 0. (42) 
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We wish to find projectors such that J2i r il~n(8i) is maximum, given the linear con- 
straints l|40H42|l . This is a linear programming problem. The duality theorem of linear 
programming says that this maximum is equal to the twice the minimum a for which 
there is a a and a j3 such that the inequality 

a + (3sm(28 + a) > J n (0) (43) 

holds for all 8. (The factor of 2 comes from the right hand side of Eq. (|4U[).l It is easy 
to see that the sine function of (|43|) and the function Iu(8) are either tangent at two 
values of 8 differing by vr/2, or are tangent at three values of 6 (or more, in degenerate 
cases), as otherwise a different sine function with a smaller a would exist. If they are 
tangent at two points, then the optimal measurement is a von Neumann measurement, 
as it contains only two orthogonal projectors. 

For the probability distribution II2 = (0, |, |), the two functions In 2 (8) an d 

- -sin(20-7r/2) (44) 

are plotted in Figure EH One can see that the sine function is greater than the function 
1(0), and the functions are tangent at the two points 8 = tt/4 and 8 = 37r/4, which 
differ by tt/2. Hence, the linear program has an optimum of #(1/2 + ^3/4) = 0.35458 
bits, and the optimal measurement is a von Neumann measurement with projectors 
tV/4 and W3.Tr/4, an d yielding 1 — #(1/2+^/4) = 0.64542 bits of accessible information. 

We wish to show that for all probability distributions II near II2, the two functions 
behave similarly to the way they behave in Figure IT3l As the details of this calculation 
are involved and not particularly illuminating, we leave them out, and merely sketch 
the outline of the proof. 

The first step is to show that for any function Iu(8) obtained using a probability 
distribution II close to (0, i, i), there is a sine function close to the original sine function 
(|44j) which always exceeds In(@) an d is tangent to a t two points in regions near 

8 = 7r/4 and 8 = 37r/4. We do this by finding values 8\ and 82 in these regions which 
differ by tt/2 and such that the derivative Ijj(8) = dln(8)/d8 evaluated at 8\ and 82 
has equal absolute values but opposite signs; these two points define the sine function. 
We show that these two points must exist by finding an e such that 

In(7r/4-e) + /n(37r/4-e) > 0, and 
/ il (7r/4-e) + /fi(3vr/4-e) < 0, 

and using the continuity of the first derivative of Iu(8)- This e is calculated by using 
the fact that if the probability distribution II is close to II2 , then 1^ is close to . 

To show that vq 1 and vg 2 are indeed the optimal projectors for the probability 
distribution II, we need to show that except at the points 8\ and 82, the sine function 
we have found is always greater than Iu(8)- We do this in two steps. First, we show that 
the sine function is greater than In{8) in the regions far from the points of tangency. 
This can be done using fairly straightforward estimation techniques, since outside of 
two regions centered around the values 8 = tt/4 and 8 = 37r/4, these functions do not 
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approach each other closely. Second, we show that the second derivative of the sine 
function is strictly greater than the second derivative d 2 Iji(9)/d8 2 in the two regions 
near the points of tangency. This shows that the function cannot meet the sine 

function in more than one point in each of these regions. 

Our calculations show that for probability distributions within 0.001 of (0, i, |) in 
the L\ norm, there are only two points of tangency. Recall the fact that to achieve 
optimal capacity, classical channels never require more input states than output states. 
This shows that using the same measurement and adjusting the smallest probability 
in II to be will improve the accessible information, showing that this accessible 
information is at most that achievable using only two trines, namely 1 — H(l/2 + 
a/3/4) = 0.64542. We need now only show that for points outside this region, the 
accessible information cannot approach 0.64542; while we have not done this rigorously, 
the graph of Figure I14( and similar graphs showing in more detail the regions near the 
points of tangency, are extremely strong evidence that this is indeed the case. In fact, 
numerical experiments appear to show that if the minimum probability of a trine state 
is less than 0.06499, then there are only two projectors in the optimal measurement; 
the probability distribution containing the minimum probability and requiring three 
projectors is approximately (0.065,0.4675,0.4675). 
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Figure 13: The red curve is 1(9) for the probability distribution (0, |, i). The green curve is 

|(1 — + -^)) + j cos 20. These curves are tangent at the points 7r/4 and 37r/4, showing 
that the optimum measurement for accessible information is the von Neumann measurement 
with projectors v w /i and ^3^/4. It yields 1 — if (| + -^) bits of accessible information. 



Appendix B: Convexity of accessible information on two 
pure states 

For section H3 we needed a proof that the accessible information on two pure states v\ 
and V2 is a concave function in the probabilities of these pure states. As opposed to 
the rest of the paper, all logarithms in this section will have base e. 
We first prove an inequality that will be used later. For < x < 1, 

2x , fl + x 

I—* ~ log It 



F (X) = -1 2 " l0 § ( 1 Z ) ^ ( 45 ) 



It is easy to see that for x = 0, both terms are 0. Differentiating and simplifying, we 
get 

F'(x) 



Ax 2 



1-x 



2\2 
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which is positive for < x < 1, so F(x) > in this range. 

We now prove that the accessible information is a concave function in p for an 
ensemble consisting of two pure states, | v±) with probability p and | V2) with probability 
1 — p. The formula for this accessible information is 

lace = H{p) - H (I + yi-AK P {\ 



P 

where k = \ (vi\v2)\ 2 , and H is the Shannon entropy function (which we will take to the 
base e in this section). Proofs of this formula can be found in |1U1 I31IT2*]. Substituting 
q = p — 1/2, we get 

lace = H (i + q) - H (i + i y/1 - «(1 - 4g 2 )) (46) 

We wish to show that the second derivative of this quantity is negative with respect 
to q, for — \ < q < ^. Let 

R = 1 - k+ iKq 2 , 

which is the quantity under the radical sign in Eq. ()46j) . 
We now differentiate l acc with respect to q and obtain 

^ = "'1^-^'^+^)-^tf'G+^ 

4 4< 7 2 k 2 4 2/e(l - k) , / 1 - 
~~ H ^ 77 t-^t ^77^ — In 



l-4g 2 R n(l-4q 2 ) i? 3 / 2 ll + 



2(1 - k) 



(1 - 4g 2 )i? 3 / 2 



-2i? 1/2 + k(1 - 4g 2 )ln 



1 + v^R" 



i?. 



which quantity we wish to show is negative. 
We thus need to show that 

Since k(1 — Aq 2 ) = 1 — R, this is equivalent to 



J-VRJ ~ l-R 

However, this is the inequality (j45j) proven above, with x = \^R, so we are done. 

Appendix C: Accessible information for various a. 

In this section, we give graphs of the accessible information for various values of a. 
These should be compared with Figure ITU which gives the graph for the planar trines, 
with a = 0. This illustrates the origin of the behavior of the two crossing curves 
giving the C± t i capacity in Figure El The line BZ gives the value of the local maximum 
at the central point (3)3)3), while the curve CY gives the behavior of the three at 
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(p, (l—p)/2, (l—p)/2). It appears from numerical experiments that this local maximum 
is achieved (or nearly achieved) using only three projectors for a < 0.27. At a value of 
a slightly above 0.27, the assumption that this local maximum is attained using a von 
Neumann measurement becomes false, and the curve of Figure EJ which appears to give 
a local maximum of the information attainable using von Neumann measurements, no 
longer corresponds to a local maximum of the accessible information. Note also that 
at the value a = 0.27, a POVM containing six projectors is required to achieve the 
Cx i capacity, even though there are onlv three 3-dimensional states in the ensemble. 




Figure 14: The accessible information for the planar trines. The probability distributions are 
represented by a triangle where the vertexes correspond to probability distributions (1,0,0), 
(0, 1, 0) and (0, 0, 1). There are four local maxima, three at probability distributions symmetric 
with (0, |, |), and one at (|, |, |). This was computed using a linear program, considering 
as possible POVM elements the projectors (cos 9, sin 9), for 36,000 evenly spaced values of 9. 
The linear programming package CPLEX was used to calculate the optimum for all probability 
distributions of the form (-^, ^,), and this graph was drawn by interpolating from these 
values. We estimate the error for each of these points (j^, <^) to be less than 10~ 5 . 
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Figure 15: The accessible information for the trines with a = 0.009; as in Fig. [21 the 
probability distributions are represented by a triangle where the vertexes correspond to proba- 
bility distributions (1,0,0), (0,1,0) and (0,0,1). The maximum at (0,0.5,0.5) (for the planar 
trines) has moved slightly away from the edge; the maximum value now occurs roughly at 
(0.0012,0.4994,0.4994). The local maximum at (|, ~, |) is growing larger with respect to the 
global maximum. This, and figures IT51 and IT7I were computed using the linear programming 
package CPLEX to determine the optimal measurement for points of the form (^, ^j, ^), 
using projectors chosen from 96,000 vectors distributed around the unit sphere. 




Figure 16: The accessible information for the trines with a = 0.018; as in Fig. El the proba- 
bility distributions are represented by a triangle where the vertexes correspond to probability 
distributions (1,0,0), (0,1,0) and (0,0,1). Here, the three local maxima have moved farther 
in from the edges, and now occur at points symmetric with (p, —^-i ^-^-) for p ps 0.027; all four 
local maxima are now nearly equal in value. 
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Figure 17: The accessible information for the trines with a = 0.027; as in Fig. 1141 the proba- 
bility distributions are represented by a triangle where the vertexes correspond to probability 
distributions (1, 0, 0), (0, 1, 0) and (0, 0, 1). There are still four local maxima, where the ones on 
the shoulders occur at points symmetric with (p, -^) for p 0.105; these will disappear 
before a reaches 0.0275. 
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